An Original Semantics to Keyword Queries for XML Using Structural Patterns

نویسندگان

  • Dimitri Theodoratos
  • Xiaoying Wu
چکیده

XML is by now the de facto standard for exporting and exchanging data on the web. The need for querying XML data sources whose structure is not fully known to the user and the need to integrate multiple data sources with different tree structures have motivated recently the suggestion of keyword-based techniques for querying XML documents. The semantics adopted by these approaches aims at restricting the answers to meaningful ones. However, these approaches suffer from low precision, while recent ones with improved precision suffer from low recall. In this paper, we introduce an original approach for assigning semantics to keyword queries for XML documents. We exploit index graphs (a structural summary of data) to extract tree patterns that return meaningful answers. In contrast to previous approaches that operate locally on the data to compute meaningful answers (usually by computing lowest common ancestors), our approach operates globally on index graphs to detect and exploit meaningful tree patterns. We implemented and experimentally evaluated our approach on DBLP-based data sets with irregularities. Its comparison to previous ones shows that it succeeds in finding all the meaningful answers when the others fail (perfect recall). Further, it outperforms approaches with similar recall in excluding meaningless answers (better precision). Since our approach is based on tree-pattern query evaluation, it can be easily implemented on top of an XQuery engine.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

XReason: A Semantic Approach That Reasons with Patterns to Answer XML Keyword Queries

Keyword search is a popular technique which allows querying multiple data sources on the web without having full knowledge of their structure. This flexibility comes with a drawback: usually, even though a large number of results match the user’s request only few of them are relevant to her intent. Since data on the web are often in tree-structured form, several approaches have been suggested i...

متن کامل

Using Semantics in Xml Data Management

XML is emerging as a de facto standard for information exchange over the Web, while businesses and enterprises generate and exchange large amounts of XML data daily. One of the major challenges is how to query this data efficiently. Queries typically can be represented as twig patterns. Some researchers have developed algorithms that reduce the intermediate results that are generated during que...

متن کامل

From Structure-Based to Semantics-Based: Towards Effective XML Keyword Search

Existing XML keyword search approaches can be categorized into tree-based search and graph-based search. Both of them are structure-based search because they mainly rely on the exploration of the structural features of document. Those structure-based approaches cannot fully exploit hidden semantics in XML document. This causes serious problems in processing some class of keyword queries. In thi...

متن کامل

EXTRUCT: Using Deep Structural Information in XML Keyword Search

Users who are unfamiliar with database query languages can search XML data sets using keyword queries. Previous work has shown that current XML keyword search methods, although intuitive, do not effectively use the data’s structural information and provide poor precision, recall, and ranking for most queries. Based on an extension of the concept of information theory, we have developed principl...

متن کامل

Apply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML

As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007